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Abstract. The technique of Schroeppel and Shamir (SICOMP, 1981) has 
long been the most efficient way to trade space against time for the Subset 
Sum problem. In the random-instance setting, however, improved tradeoffs 
exist. In particular, the recently discovered dissection method of Dinur et al. 
(CRYPTO 2012) yields a significantly improved space— time tradeoff curve for 
instances with strong randomness properties. Our main result is that these 
strong randomness assumptions can be removed, obtaining the same space- 
time tradeoffs in the worst case. We also show that for small space usage the 
dissection algorithm can be almost fully parallelized. Our strategy for dealing 
with arbitrary instances is to instead inject the randomness into the dissec- 
tion process itself by working over a carefully selected but random composite 
modulus, and to introduce explicit space— time controls into the algorithm by 
means of a "bailout mechanism". 



1. Introduction 

The protagonist of this paper is the Subset Sum problem. 

Definition 1.1. An instance (a,t) of Subset Sum consists of a vector a g Z™ 
and a target t € Z> . A solution of (a,t) is a vector x S {0,1}™ such that 

The problem is NP-hard (in essence, Karp's formulation of the knapsack prob- 
lem [6 ) , and the fastest known algorithms take time and space that grow exponen- 
tially in n. We will write T and 5 for the exponential factors and omit the possible 
polynomial factors. The brute-force algorithm, with T = 2™ and 5 = 1, was beaten 
four decades ago, when Horowitz and Sahni [4J gave a simple yet powerful meet- 
in-the-middle algorithm that achieves T = S — 2™/ 2 by halving the set arbitrarily, 
sorting the 2™/ 2 subsets of each half, and then quickly scanning through the rele- 
vant pairs of subsets that could sum to the target. Some years later, Schroeppel 
and Shamir [TU] improved the space requirement of the algorithm to 5 = 2™/ 4 by 
designing a novel way to list the half-sums in sorted order in small space. How- 
ever, if allowing only polynomial space, no better than the trivial time bound of 
T = 2™ is known. Whether the constant bases of the exponentials in these bounds 
can be improved is a major open problem in the area of moderately exponential 
algorithms |llj . 

The difficulty of finding faster algorithms, whether in polynomial or exponential 
space, has motivated the study of space-time tradeoffs. From a practical point of 
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Figure 1. Space-time tradeoff curves for the Subset Sum prob- 
lem |10l [5], The space and time requirements are S = 2 an and 
T = 2™, omitting factors polynomial in the instance size n. 



view, large space usage is often the bottleneck of computation, and savings in space 
usage can have significant impact even if they come at the cost of increasing the 
time requirement. This is because a smaller-space algorithm can make a better 
use of fast cache memories and, in particular, because a smaller-space algorithm 
often enables easier and more efficient large-scale parallelization. Typically, one 
obtains a smooth space-time tradeoff by combining the fastest exponential time 
algorithm with the fastest polynomial space algorithm into a hybrid scheme that 
interpolates between the two extremes. An intriguing question is then whether one 
can beat the hybrid scheme at some point, that is, to get a faster algorithm at 
some space budget — if one can break the hybrid bound somewhere, maybe one can 
break it everywhere. For the Subset Sum problem, a hybrid scheme is obtained 
by first guessing some g elements of the solution, and then running the algorithm 
of Schroeppel and Shamir for the remaining instance on n — g elements. This yields 
T = 2< n+ 9)/ 2 and S = 2<"-^/ 4 , for any < g < n, and thereby the smooth tradeoff 
curve S 2 T = 2" for 1 < S < 2"/ 4 . We call this the Schroeppel- Shamir tradeoff. 

While the Schroeppel-Shamir tradeoff has remained unbeaten in the usual worst- 
case sense, there has been remarkable recent progress in the random-instance set- 
ting [51 ID 13 ■ I n a recent result, Dinur, Dunkelman, Keller, and Shamir [3J gave a 
tradeoff curve that matches the Schroeppel-Shamir tradeoff at the extreme points 
5 = 1 and S = 2™/ 4 but is strictly better in between. The tradeoff is achieved 
by a novel dissection method that recursively decomposes the problem into smaller 
subproblems in two different "dimensions", the first dimension being the current 
subset of the n items, and the other dimension being (roughly speaking) the bits of 
information of each item. The algorithm of Dinur et al. runs in space S = 2 <Tn and 
time T = 2 T ( CT ' n on random instances (t(<t) is defined momentarily). See Figure [l] 
for an illustration and comparison to the Schroeppel-Shamir tradeoff. The tradeoff 
curve t((j) is piecewise linear and determined by what Dinur et al. call the "magic 
sequence" 2, 4, 7, 11, 16, 22, . . ., obtained as evaluations of p/ = 1 + £(£ + l)/2 at 
£=1,2,.... 
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Definition 1.2. Define t : (0, 1] -> [0, 1] as follows. For a £ (0, 1/2], tei £ be the 
solution to l/pe + i < a < 1/pi- Then 

If there is no such £, that is, if a > 1/2, define t(<j) = 1/2. 

For example, at a — 1/8, we have I = 3, and thereby r(tj) = 19/32. Asymptotically, 



when er is small, i is essentially y 2/er and t(<j) «1- v2ct- 

In this paper, we show that this space-time tradeoff result by Dinur et al. [5] 
can be made to hold also in the worst case: 

Theorem 1.3. For each a £ (0, 1] there exists a randomized algorithm that solves 
the SUBSET Sum problem with high probability, and runs in 0*(2 T <»™) time and 
0*(2 crn ) space. The O* notation suppresses factors that are polynomial in n, and 
the polynomials depend on a . 



To the best of our knowledge, Theorem |1.3| is the first improvement to the 
Schroeppel-Shamir tradeoff in the worst-case setting. Here we should remark 
that, in the random-instance setting, there are results that improve on both the 
Schroeppel-Shamir and the Dinur et al. tradeoffs for certain specific choices of the 
space budget S. In particular, Becker et al. give a 2 - 72 ™ time polynomial space al- 
gorithm and a 2 0291 " time exponential space algorithm [T]. A natural question that 
remains is whether these two results could be extended to the worst-case setting. 
Such an extension would be a significant breakthrough (cf. |llj). 

We also prove that the dissection algorithm lends itself to parallelization very 
well. As mentioned before, a general guiding intuition is that algorithms that use 
less space can be more efficiently parallelized. The following theorem shows that, 
at least in the case of the dissection algorithm, this intuition can be made formal: 
the smaller the space budget a is, the closer we can get to full parallelization. 



Theorem 1.4. The algorithm of Theorem \1.3\ can be implemented to run in 
q* (2 T ( cr ) Tl /p) parallel time on P processors each using 0*(2 an ) space, provided 
P < 2( 2r (°")- 1 )™. 

When a is small, r(cr) ss 1 — y/2a and the bound on P is roughly 2 < - T< - a ^^ 7 ^ n . 
In other words we get a linear speedup almost all the way up to 2 T ^ n processors, 
almost full parallelization. 

1.1. Our contributions and overview of the proof. At a high level, our ap- 
proach will follow the Dinur et al. dissection framework, with essential differences 
in preprocessing and low-level implementation to alleviate the assumptions on ran- 
domness. In particular, while we split the instance analogously to Dinur et al. to 
recover the tradeoff curve, we require more careful control of the sub-instances be- 
yond just subdividing the bits of the input integers and assuming that the input 
is random enough to guarantee sufficient uniformity to yield the tradeoff curve. 
Accordingly we find it convenient to revisit the derivation of the tradeoff curve and 
the analysis of the basic dissection framework to enable a self-contained exposition. 

In contrast with Dinur et al., our strategy for dealing with arbitrary instances 
is, essentially, to instead inject the required randomness into the dissection process 
itself. We achieve this by observing that dissection can be carried out over any 
algebraic structure that has a sufficiently rich family of homomorphisms to enable 
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us to inject entropy by selection of random homomorphisms, while maintaining 
an appropriate recursive structure for the selected homomorphisms to facilitate 
dissection. For the Subset Sum problem, in practice this means reduction from Z 
to Zm ° ver a composite M with a carefully selected (but random) lattice of divisors 
to make sure that we can still carry out recursive dissections analogously to Dinur 
et al. This approach alone does not provide sufficient control over an arbitrary 
problem instance, however. 

The main obstacle is that, even with the randomness injected into the algorithm, 
it is very hard to control the resource consumption of the algorithm. To overcome 
this, we add explicit resource controls into the algorithm, by means of a somewhat 
cavalier "bailout mechanism" which causes the algorithm to simply stop when too 
many partial solutions have been generated. We set the threshold for such a bailout 
to be roughly the number of partial solutions that we would have expected to see 
in a random instance. This allows us to keep its running time and space usage in 
check, perfectly recovering the Dinur et al. tradeoff curve. The remaining challenge 
is then to prove correctness, i.e., that these thresholds for bailout are high enough 
so that no hazardous bailouts take place and a solution is indeed found. To do this 
we perform a localized analysis on the subtree of the recursion tree that contains 
a solution. Using that the constructed modulus M contains a lot of randomness 
(a consequence of the density of the primes) , we can show that the probability of 
a bailout in any node of this subtree is o(l), meaning that the algorithm finds a 
solution with high probability. 

A somewhat curious effect is that in order for our analysis to go through, we 
require the original Subset Sum instance to have few, say O(l), distinct solutions. 
In order to achieve this, we preprocess the instance by employing routine isolation 
techniques in Zp but implemented overZ to control the number of solutions over Z. 
The reason why we need to implement the preprocessing over Z rather than than 
work in the modular setting is that the dissection algorithm itself needs to be able 
to choose a modulus M very carefully to deliver the tradeoff, and that choice is 
incompatible with having an extra prime P for isolation. This is somewhat curious 
because, intuitively, the more solutions an instance has, the easier it should be to 
find one. The reason why that is not the case in our setting is that, further down in 
the recursion tree, when operating with a small modulus M, every original solution 
gives rise to many additional spurious solutions, and if there are too many original 
solutions there will be too many spurious solutions. 

A further property needed to support the analysis is that the numbers in the 
Subset Sum instance must not be too large, in particular we need logt = 0(n). 
This we can also achieve by a simple preprocessing step where we hash down modulo 
a random prime, but again with implementation over the integers for the same 
reason as above. 

1.2. Related work. The Subset Sum problem has recently been approached from 
related angles, with the interest in small space. Lokshtanov and Nederlof jS] show 
that the well-known pseudo-polynomial-time dynamic programming algorithm can 
be implemented in truly-polynomial space by algebraization. Kaski, Koivisto, and 
Nederlof [7] note that the sparsity of the dynamic programming table can be ex- 
ploited to speedup the computations even if allowing only polynomial space. 

Smooth space-time tradeoffs have been studied also for several other hard prob- 
lems. Bjorklund et al. [2] derive a hybrid scheme for the Tutte polynomial that is a 
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host of various counting problems on graphs. Koivisto and Parviainen [8] consider 
a class of permutation problems (including, e.g., the traveling salesman problem 
and the feedback arc set problem) and show that a natural hybrid scheme can be 
beaten by a partial ordering technique. 

1.3. Organization. In Section[2]we describe the dissection algorithm and give the 
main statements about its properties. In Section|3]we show that the algorithm runs 
within the desired time and space bounds. Then, in Section [4] we show that given a 
Subset Sum instance with at most O(l) solutions, the dissection algorithm finds 
a solution. In Section [5] we give a standard isolation argument reducing general 
Subset Sum to the restricted case when there are at most O(l) solutions, giving 
the last puzzle piece to complete the proof of Theorem |1.3| In Section [6] we show 
that the algorithm lends itself to efficient parallelization by proving Theorem |1.4| 

2. The Main Dissection Algorithm 

Before describing the main algorithm, we condense some routine preprocessing 
steps into the following theorem, whose proof we relegate to Section [5j 

Theorem 2.1. There is a polynomial-time randomized algorithm for preprocessing 
instances of SUBSET SUM which, given as input an instance (a, t) with n elements, 
outputs a collection of 0(n 3 ) instances (a! ,t'), each with n elements and \ogt' = 
0(n), such that if (a,t) is a NO instance then so are all the new instances with 
probability 1 — o(l), and if (a,t) is a YES instance then with probability f2(l) at 
least one of the new instances is a YES instance with at most 0(1) solutions. 

By applying this preprocessing we may assume that the main algorithm receives 
an input (a,t) that has O(l) solutions and \ogt = 0(n). We then introduce a 
random modulus M and transfer into a modular setting. 

Definition 2.2. An instance (a,t,M) of Modular Subset Sum consists of a 
vector a £ Z" Q; a target t £ Z> , and a modulus M £ Z>i . A solution of (a, t, M) 
is a vector x £ {0, 1}™ such that Y^i=i a i x i = ^ (niod M). 

The reason why we transfer to the modular setting is that the recursive dissection 
strategy extensively uses the fact that we have available a sufficiently rich family 
of homomorphisms to split the search space. In particular, in the modular setting 
this corresponds to the modulus M being "sufficiently divisible" (in a sense to be 
made precise later) to obtain control of the recursion. 

Pseudocode for the main algorithm is given in Algorithm ^ In addition to 
the modular instance (a,t,M), the algorithm accepts as further input the space 
parameter a £ (0, 1]. 

The key high-level idea in the algorithm is to "meet in the middle" by splitting an 
instance of n items to two sub-instances of an items and (1 — a)n items, guessing 
(over a smaller modulus M' that divides M) what the sum should be after the 
first and before the second sub-instance, and then recursively solving the two sub- 
instances subject to the guess. Figure [2] illustrates the structure of the algorithm. 
We continue with some further high-level remarks. 
(1) In the algorithm, two key parameters a and (3 are chosen, which control 
how the Modular Subset Sum instance is subdivided for the recursive 
calls. The precise choice of these parameters is given in Theorem |2.3| below, 
but at this point the reader is encouraged to simply think of them as some 
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Algorithm 1: GenerateSolutions(ci, t, M, cr) 



Data: (a, t, M) is an n-element Modular Subset Sum instance, a £ (0, 1] 
Result: Iterates over up to 0*(2™/M) solutions of (a,t,M) while using 
space 0*(2 CTn ) 

l begin 
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if a > 1/4 then 

Report up to 0*(2™/M) solutions using the Shrocppcl- Shamir 
algorithm 
return 



Choose a £ (0, £ (0, 1) appropriately (according to Theorem 2.3 1 
based on a 

Let M' be a factor of M of magnitude 8(2 /3ti ) 
for s' = 0,1,..., M' - 1 do 

Allocate an empty lookup table 
Let I = (oi, ct2, . . . , a an ) be the first an items of a 
Let r = (a Q „ + i, ao, n+ 2, ■ ■ ■ , a n ) be the remaining (1 — a)n items of a 
for y e GenerateSolutions(Z, s' , M' , do 
Let s = Yh=i a iVi m °d M 
Store [s —> y] in the lookup table 

for Z £ GENERATESOLUTIONS(r, t — s' , M' , do 
Let s = i - X^Lan+i a { z t mod M 
foreach [s — > y] in the lookup table do 
Report solution x — (y, z) 
if at least Q*(2 n /M) solutions reported then 
Stop iteration and return 

Release the lookup table 



parameters which should be chosen appropriately so as to optimize running 
time. 

(2) The algorithm also chooses a factor M' of M such that W = 6(2^"). 
The existence of sufficient factors at all levels of recursion is established in 
Section |H 

(3) The algorithm should be viewed as an iterator over solutions. In other 
words, the algorithm has an internal state, and a next item functionality 
that we tacitly use by writing a for-loop over all solutions generated by 
the algorithm, which should be interpreted as a short-hand for repeatedly 
asking the iterator for the next item. 

(4) The algorithm uses a "bailout mechanism" to control the running time and 
space usage. Namely, each recursive call will bail out after 0*(2™/M) solu- 
tions are reported. (The precise bailout bound has a further multiplicative 
factor polynomial in n that depends on the top-level value of it.) A pre- 
liminary intuition for the bound is that this is what one would expect to 
receive in a particular congruence class modulo M if the 2™ possible sums 
are randomly placed into the congruence classes. 
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s' e {0 M' — 1} 




Figure 2. Illustration of the recursive dissections made by the algorithm. 

As a warmup to the analysis, let us first observe that, if we did not have the 
bailout step in line[l9] correctness of the algorithm would be more or less immediate: 
for any solution x of (a, t, M), let s = X)f=i a i x i mod M. Then, when s' = s mod 
M' in the outer for-loop (line [7]) , by an inductive argument we will find y and z in 
the two separate recursive branches and join the two partial solutions to form x. 

The challenge, of course, is that without the bailout mechanism we lack control 
over the resource consumption of the algorithm. Even though we have applied 
isolation to guarantee that there are not too many solutions of the top-level instance 
(a.t), it may be that some branches of the recursion generate a huge number of 
solutions, affecting both running time and space (since we store partial solutions in 
a lookup table). 

Let us then proceed to analyzing the algorithm with the bailout mechanism in 
place. The two main claims are as follows. 

Theorem 2.3. Given a space budget a € (0, 1] and M > 2™, if in each recursive 
step of Algorithm^the parameters a and (3 are chosen as 

(2) a = 1 - t(ct) and = 1- t(<t) - a , 

then the algorithm runs in 0*(2 T ^ n ) time and 0*(2 an ) space. 

Theorem 2.4. For every a £ (0, 1] there is a randomized algorithm that runs in 
time polynomial in n and chooses a top-level modulus M > 2 n so that Algorithm^ 
reports a solution of the non-modular instance (a, t) with high probability over the 
choices of M , assuming that at least one and at most 0(1) solutions exist and that 
\ogt = 0(n). 
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Algorithm 2: DuMMYDlSSECTlON(n, a) 


Data: a G (0, 1] 


l begin 


2 


if a > 1/4 then 


3 




Run for 2™/ 2 steps 


4 




return 


5 


Let a = 1 — t(ct), /3 = a — 


6 


for 2' 9n steps do 


7 




DummyDissection (cm, a/a) 
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DummyDissection((1 — a)n,a/(l - a)) 



We prove Theorem 2^3 in Section [3] and Theorem 2A_ in Section |4j 



Let us however here briefly discuss the specific choice of a and ft in Theorem |2. 3 1 
We arrived at Q by analyzing the recurrence relation describing the running time of 
Algorithm]!] Unfortunately this recurrence in its full form is somewhat complicated, 
and our process of coming up with Q involved a certain amount of experimenting 
and guesswork. We do have some guiding (non-formal) intuition which might be 
instructive: 

(1) One needs to make sure that a — ft < a. This is because for a random 
instance, the left subinstance is expected to have roughly 2 < - a ~^ n solutions, 
and since we need to store these there had better be at most 2 cr ™ of them. 

(2) Since ft > a — a and ft has a very direct impact on running time (due to 
the 2^™ time outer loop), one will typically want to set a relatively small. 
The tension here is of course that the smaller a becomes, the larger 1 — a 
(that is, the size of the right subinstance) becomes. 

(3) Given this tension, setting a — ft = a is natural. 

So in an intuitive sense, the bottleneck for space comes from the left subinstance, 
or rather the need to store all the solutions found for the left subinstance (this is not 
technically true since we give the right subinstance 2 an space allowance as well), 
whereas the bottleneck for time comes from the right subinstance, which tends to 
be much larger than the left one. 

3. Analysis of Running Time and Space Usage 



In this section we prove Theorem |2.3| giving the running time upper bound on 
the dissection algorithm. For this, it is convenient to define the following function, 
which is less explicit than r but more naturally captures the running time of the 
algorithm. 

Definition 3.1. Define F : (0, 1] — > (0, 1) by the following recurrence for a < 1/4: 
(3) F(a) =/3 + max{aF(^),(l~a)F(^-)}, 

where a = 1 — t{<j) and ft = a — a . The base case is F(a) = 1/2 for a > 1/4. 

To analyze the running time of the dissection algorithm, let us first define a 
'dummy" version of Algorithm [T] given as Algorithm [2j The dummy version is a 
bare bones version of Algorithm [T] which generates the same recursion tree. 
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The following lemma is immediate from the definition of F(a). 

Lemma 3.2. Algorithm^ runs in 0* (2 F ^ n ) time on input (n,cr). 

Next, we can relate the running time of the dummy algorithm to the running 
time of the actual algorithm. Ignoring polynomial factors such as those arising from 
updating the lookup table, the only time-consuming step of Algorithm [T] that we 
have omitted in Algorithm [2] is the combination loop in steps [16] to [19] The total 
amount of time spent in this loop in any fixed recursive call is, by virtue of step |19| 
at most 0*(2 n /M). So if M > 2^- F ^ n then this time is dominated by the run 
time from the recursive calls. In other words: 

Lemma 3.3. Consider running Algorithm U\ on input (a,t, M,a). If in every 
recursive call made it holds that M > 2 < - 1 ~ F ^)™ then the running time is within 
a polynomial factor of the running time of Algorithm^ on input (n, a), that is, at 
most 0*(2 F (°>). 

The next key piece is the following lemma, stating that the function F is nothing 



more than a reformulation of r(a). We defer the proof to Section 3.1 
Lemma 3.4. For every a £ (0, 1] it holds that F(a) = r(a). 

Equipped with this lemma, we are in good shape to prove Theorem |2. 3 



Theorem 2.3 (restated). Given a space budget a € (0, 1] and M > 2™, if in each 
recursive step of Algorithm^ the parameters a and (3 are chosen as 

(4) a = 1 — r(cr) and /3 = 1 — t(ct) — a , 

then the algorithm runs in O* {2 T ^ n ) time and 0*(2 an ) space. 

Proof of Theorem\2.3\ Let us start with space usage. There are three items to 



bound: (1) the space usage in the left branch (step 111, (2) the space usage in 
the right branch (step 14 1, and (3) the total number of solutions found in the left 
branch (as these are all stored in a lookup table). For (1), the subinstance (I, s', M') 
has an items and has a space budget of a /a, so by an inductive argument it uses 
space 0(2° Q ") = 0(2 an ). The case for (2) is analogous. It remains to bound (3), 
which is clearly bounded by the number of solutions found in the recursive step |11| 
However, by construction, this is (up to a suppressed factor polynomial in n) at 
most 2 an /M' = 0{2 ( - a ^^ n ) = 0(2 an ). 

We thus conclude that the total space usage of the algorithm is bounded by 



0*(d2 an ) where d is the recursion depth, which is O(l) by Lemma 4.2 



Let us turn to time usage. First, to apply Lemma |3.3[ we need to make sure 
that we always have M > 2^ 1 ~ F ^ a ^ n = 2^ 1 ^ T ^' y ^ n in every recursive call. In the top 
level call this is true since M > 2 n . Suppose (inductively) that it is true in some 
recursive call, and let us prove that it holds for both left- and right-recursive calls. 
We refer to the respective values of the parameters by adding subscripts I and r. 

In a left-recursive call, we have ni — an, Mi — 2 /9 ™, and cr; = a/a. We thus 
need 2< 9 ™ > 2 ( - 1 - r( - a / a » an . Noting that 1 - r(a/a) < 1/2 and that (3 > a/2 (this is 
equivalent to r(cr) < 1 — 2a), we see that Mi is sufficiently large. 

In a right-recursive call, we have n r = (1 — a)n = r(a)n, M r — 2^ n , and 
a r = a / (1 — a) = a/r(a). By Proposition |3.6| we have 1 — r(a r ) — (1 — a — 
t{o))/t{o) = f3/r(a), from which we conclude that M r = 2 ( - 1 - T ( a ^ n - . 

Thus the conditions of Lemma |3. 3 1 are satisfied, and the ru nning ti me boun d of 



0*(2 T ^ n ) for Algorithm^ is a direct consequence of Lemmata 



3.2 



3.3 



and 



3.4 



□ 
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3.1. Proof of Lemma |3.4[ We first prove some useful properties of the r function. 

Proposition 3.5. The map cr t— > cj/t(<j) is increasing in a £ (0,1]. Furthermore, 
for a = 1/pe+x, we have a/r{a) = 1/pe- 

Proof. Let cr G (0, 1], and let a' = a/r(a). If a > 1/2, then r(cr) = 1/2, and thus 
cr' = 2cr is increasing in cr. Otherwise 1/pe+i < a < 1/pe for some £ > 1, and 
t(ct) = (^- (ft - 2)ct)/(^+ 1). Thus 

1 = r(ff) = £-( Pt -2)a = l/a - ft + 2 
& a (£ + l)a l+l 

from which it follows that a' is increasing in a in the interval (1/p^+i, 1/pi]. 

Suppose a = l/p e+ i. Use first p e+1 = p f + 1 + 1 and then 1(1 + 1) = 2(p e — 1) 
to obtain 

1 £{p e +£ + l)- p e + 2 {£ - l) Pt + 2(p e - 1) + 2 

^ = £TI = I+i = pt ' a 

Proposition 3.6. Let a € (0, 1]. If a > 1/2, then r(<r) = 1/2, and otherwise 

I- a 

t{&) 



2 - T(a/r(a)) ' 

Proof. The case ct > 1/2 is obvious. Fix cr < 1/2 and £ > 1 such that 1/p^+i < cr < 
1/ft and let cr' = a/r(a). By Proposition 3.5 we have that 1/ft < cr' < \jpi—\. 
Using r(cr') = - 1 - (p t _ x - 2)a')/£ we obtain 

2 _ r(a0 = ^ + i + (^- 1 -2K - 

Plugging in cr' = ct/t(ct) and using ft_i — pi — £ gives 

(5) 2-r(0 = ^ +1 ^ + ^-^ 2 ^. 

«r(cr) 

As r(cr) = (£ — (pe — 2)cr)/(£ + 1), the numerator of this expression equals 

I -{jit- 2)a + ( Pl - I - 2)o = l(\ - cr) . 

Plugging this into |5| we conclude that 

2-r(v') 1 T 



r(cr) ' 

which is a simple rearrangement of the desired conclusion. □ 
We are now ready to prove Lemma |3.4| 



Lemma 3.4 (restated). For every a € (0, 1] it holds that F(o~) = r(cr). 



Proof of Lemma \3.4\ The proof is by induction on the value of £ such that l/p^ +1 < 
°~ < ^1 Pi- The base case, cr > 1/4 (that is, £ < 1) is clear from the definitions. 

For the induction step, fix some value of £ > 2, and assume that F(a') = r(cr') 
for all cr' > 1/ pi. We need to show that for any cr in the interval [l/pi + i, 1/pe), it 
holds that F(a) — r(cr). To this end, we set a = 1 — r(cr) and = 1 — r(cr) — a. 
and show that the two options in the max in ([3]) are bounded by r(cr), one with 
equality. 
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= 0.0500 1 = 0.7167 

a = 0.2833 p = 0.2333 y = 0.2333 



a = 0.1765 x = 0.5490 

a = 0.4510 |3 = 0.2745 y = 0.0778 



= 0.0698 x = 0.6744 

a = 0.3256 (3 = 0.2558 y = 0.1833 



a = 0.3913 



= 0.3214 



a = 0.2143 x = 0.5238 

a = 0.4762 p = 0.2619 y = 0.0611 



a = 0.4500 



a = 0.4091 



a = 0.2727 



a = 0.1034 x = 0.6207 

a = 0.3793 p = 0.2759 y = 0.1333 



a = 0.1667 x = 0.5556 

a = 0.4444 p = 0.2778 y = 0.0833 



= 0.3750 = 0.3000 



FIGURE 3. The dissection tree 2?T(0.05). For each internal node 
v, we display the parameters <t v ,t v — t(<j v ), a v , /?„, j v as defined 
in Section |U 



Consider first the second option. Set a' = <r/(l — a) = <t/t(<j). By Proposi- 
tion 



3.5 we have a' > l/pe- Thus, by the induction hypothesis we have F(a') 
r(cr'), and hence the second option in ^ equals 

P + (1 - a)r(cr') = 1 - t(o-) - a + t((j)t(<j /t(<j)) = t(<j) , 



where the last step is an application of Proposition |3.6[ 

Consider then the first option. Let a" — a /a be the value passed to F in this 
branch. It is easy to check that a" > a' > 1/pe- So the induction hypothesis 
applies, and we get an upper bound of 

P + ar(cr") < P + (1 - a)T(a') < r(a) . 

The first step uses t(ct) > 1/2 (yielding a < 1/2) and the monotonicity of r, and 
the last step uses the bound on the second option. □ 

4. Choice of Modulus and Analysis of Correctness 



In this section we prove Theorem |2.4[ giving the correctness of the dissection 
algorithm. 

4.1. The dissection tree. Now that we have the choice of a and P in Algorithm]!] 
we can look more closely at the recursive structure of the algorithm. To this end, 
we make the following definition. 

Definition 4.1 (Dissection tree). For a £ (0, 1], the dissection tree VT(cr) is the 
ordered binary tree defined as follows. If a > 1/4 then T>T{u) is a single node. 
Otherwise, let a = 1 — t(cj). The left child ofT>T(o) is T>T(aja), and the right 
child ofVT{o) is T>T(a/(l -a)). 

Figure [3] shows 2?T(0.05). The dissection tree captures the essence of the recur- 
sive behaviour of the dissection algorithm when being run with parameter a. The 
actual recursion tree of the dissection algorithm is huge due to the for-loop over 
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S 111 line but if we consider a fixed choice of s in every recursive step then the 
recursion tree of the algorithm becomes identical to the corresponding dissection 
tree. 

Lemma 4.2. The recursion depth of Algorithm [7] is the height of T>T{a). In 

particular, the recursion depth is a constant that depends only on a. 

We now describe how to choose a priori a random M that is "sufficiently divisible" 
for the algorithm's desires, and to show correctness of the algorithm. 

Fix a choice of the top-level value a € (0, 1]. Consider the corresponding dissec- 
tion tree T>T(o). For each node v of 2?T(<r), write a v for the associated a value. 
For an internal node v let us also define a v = 1 — t(<t„) and (3 V = 1 — a v — r(a v ). 
In other words, if vi and t>2 are the two child nodes of v, then o~ Vl = o~ v /a v and 
c« 2 = °t>/(l — a v)- Finally, define j v = /3 V ■ a ja v . 

Observe that each recursive call made by Algorithm [T] is associated with a unique 
internal node v of the dissection tree T>T(a) ■ 

Lemma 4.3. Each recursive call associated with an internal node v requires a 
factor M' of magnitude Q*(2^ n ). 

Proof. Telescope a product of the ratio o~ p /o~ u for a node u and its parent p along the 
path from v to the root node. Each such a p j g u is either a u or 1 — a u depending 
on whether it is a left branch or right branch — precisely the factor by which n 
decreases. □ 

4.2. Choosing the modulus. The following lemma contains the algorithm that 
chooses the random modulus. 

Lemma 4.4. For every a € (0, 1] there exists a randomized algorithm that, given 
integers n and b = O(n) as input, runs in time polynomial in n and outputs for 
each internal node v € T>T(a) random moduli M v , M' v such that, for the root node 
r £ T>7~(a), M r > 2 b , and furthermore for every internal node v: 

(1) M' v is of magnitude 9(2^™), 

(2) M v = M' p , where p is the parent of v, 

(3) M' v divides M v , and 

(4) for any fixed integer 1 < Z < 2 b , the probability that M' v divides Z is 
0*{l/M' v ). 

Proof. Let < Ai < A2 < • • • < Xk be the set of distinct values of j v ordered by 
value, and let Si = \i — Ai_i be their successive differences (where we set Ao = so 
that Si = Ai). Since T>T{cr) depends only a and not on n, we have k = 0(1). For 
each 1 < i < k independently, let pi be a uniform random prime from the interval 
[2 tf * n , 2- 2 5 *"]. 

For a node v such that j v = Xj, let M' v = Yl{=i Pj- Condition [l] then holds by 
construction. The values of M v are the determined for all nodes except the root 
through condition |2] for the root node r we set M r — poM' r , where po is a random 
prime of magnitude 2 e (") to make sure that M r > 2 b . 

To prove condition [3] note that for any node v with parent p, we need to prove 
that M' v divides M' p . Let j v be such that Xj v = j v and j p such that Xj p = 7 P . 
Noting that the value of 7„ decreases as one goes down the dissection tree, it then 
holds that j v < j p , from which it follows that M' v = n*=i Pi divides M' p = n£i Pi- 



SPACE-TIME TRADEOFFS FOR SUBSET SUM 



13 



Finally, for condition [4] again let j be such that Xj — j v , and observe that in order 
for Z to divide M' v it must have all the factors P\,Pi, ■ ■ ■ iPj- For each 1 < i < j, Z 
can have at most l °f 2 r ^ = 0(1) different factors between 2 SiTL and 2 • 2 5in , so by the 
Prime Number Theorem, the probability that pi divides Z is at most 0(n2~ Sin ). As 
the p^s are chosen independently the probability that Z divides all of p\,p2, . . ■ ,Pj 
(that is, M' v ) is 0{nn-( &1+S2+ - +s ^ n ) = 0(n k 2-^ n ) = 0*(1/M^), as desired. □ 

4.3. Proof of correctness. We are now ready to prove the correctness of the 
entire algorithm, assuming preprocessing and isolation has been carried out. 



Theorem 2.4 (restated). For every a G (0,1] there is a randomized algorithm 
that runs in time polynomial in n and chooses a top-level modulus M > 2 n so 
that Algorithm [7] reports a solution of the non-modular instance (a, t) with high 
probability over the choices of M , assuming that at least one and at most O(l) 
solutions exist and that logi = 0(n). 

Proof. The modulus M is chosen using Lemma [4. 4| with b set to max{n, log ni} = 
0(rt). Specifically, it is chosen as M r for the root node r ofDT(cr). 

Fix a solution x* of (a,t), that is, Y17=i aiX i = (Note that this is an equality 
over the integers and not a modular congruence.) By assumption such an x* exists 
and there are at most O(l) choices. 

If a > 1/2, the top level recursive call executes the Schroeppel-Shamir algorithm 
and a solution will be discovered. So suppose that a € (0, 1/4). 

For an internal node v £ T>l~(o) consider a recursive call associated with v. and 
let L v C [n] (resp. R v C [n]) be the set of a v n v (resp. (1 — a v )n v ) indices of the 
items that are passed to the left (resp. right) recursive subtree of v. Note that these 
indices are with respect to the top-level instance, and that they do not depend on 
the choices of s' made in the recursive calls. Let s' v € {0, . . . , M^} be the choice of 
s' that could lead to the discovery of x* , in other words s' v — J^ieL a i x i m °d M' v . 
Let = L v U R v . 

For a leaf node v £ VT{cr) and its parent p, define = L p if v is a left child of 
p, and I v = R p if v is a right child of p. 

We now restrict our attention to the part of the recursion tree associated with 
the discovery of x* , or in other words, the recursion tree obtained by fixing the 
value of s' to s' v in each recursive step, rather than trying all possibilities. This 
restricted recursion tree is simply VT(cr). Thus the set of items a v = (ai) ie j v and 
the target t v associated with v is well-defined for all v E T>T{cr). 

Denote by B(v) the event that (a v ,t v , M v ) has more than 0*(2™"/M t ,) solu- 
tions. Clearly, if B(v) does not happen then there can not be a bailout at node 
We will show that U ve x>T_(a)B(v) happens with probability o(l) over the choices of 
{M„,M^} from Lemma [4.4[ which thus implies that x* is discovered with proba- 
bility 1 — o(l). Because VT(o-) has 0(1) nodes, by the union bound it suffices to 
show that Pr[_B(u)] = o(l) for every v g VT(o-). 

Consider an arbitrary node v e T>l~(o~). There are two types of solutions x v of 
the instance (a Vl t v , M v ) associated with v. 

First, a vector x v £ {0, l}™ 1 ' is a solution if J27=i a v,i%v,i = 2~2i^i a i x l- (Note 
that this is an equality over the integers, not a modular congruence.) Because there 



^The converse is not true though: it can be that B(v) happens but a bailout happens in one 
(or both) of the two subtrees of v, causing the recursive call associated with node v to not find all 
the solutions to (a v ,t v , M v ) and thereby not bail out. 
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are at most O(l) solutions to the top-level instance, there are at most O(l) such 
vectors x v . Indeed, otherwise we would have more than O(l) solutions of the top 
level instance, a contradiction. 

Second, consider a vector x v G {0, 1}™" such that X)"=i a v,i%v,i ^ a i x t 
(over the integers). Let Z = | X)I=i a v.iX v .i — 2i£j a i x * I 7^ 0. Such a vector x v is a 
solution of (a v ,t v , M v ) only if M„ divides Z. Since logi = 0(n) and 1 < Z < nt, by 
Lemma [4. 4| item [4] we have that Z is divisible by M v with probability O* (1/M V ). 

From the two cases it follows that the expected number of solutions x v of 
(a v ,t v , M v ) is E = 0*(2 n " /M v ). (We remark that the degree in the suppressed 
polynomial depends on a but not on n.) Setting the precise bailout threshold to 
n ■ E, we then have by Markov's inequality that Pr[B(v)} — Pr[#solutions x v > 
nE] < 1/n = o(l), as desired. Since v was arbitrary, we are done. □ 

5. Preprocessing and Isolation 
This section proves Theorem |2 . 1 1 using standard isolation techniques. 

Theorem |2.1| (restated). There is a polynomial-time randomized algorithm for 
preprocessing instances of SUBSET SUM which, given as input an instance (a, t) with 
n elements, outputs a collection ofO(n 3 ) instances (a' ,t'), each with n elements and 
logt' = 0(n), such that if (a,t) is a NO instance then so are all the new instances 
with probability 1 — o(l), and if (a,t) is a YES instance then with probability 51(1) 
at least one of the new instances is a YES instance with at most 0(1) solutions. 

Proof. We carry out the preprocessing in two stages. Each stage considers its input 
instances (a, t) one at a time and produces one or more instances (a',t') for the 
next stage, the output of the second stage being the output of the procedure. 

The first stage takes as input the instance (a, t) given as input to the algorithm. 
Without loss of generality we may assume that (a, t) satisfies ai < t for all i — 
1,2,... ,n. Indeed, we may simply remove all elements i with > t. Hence 
— S"=i a i x i — n t f° r a H x ^ {0, 1}™. A further immediate observation is that we 
may assume that logn< < 2". Indeed, otherwise we can do an exhaustive search 
over all the 2™ subsets of the input integers in polynomial time in the input size 
(and then output a trivial YES or NO instance based on the outcome without 
proceeding to the second stage). Next, select a uniform random prime P with, say, 
3n + 1 bits. For each k = 0, 1, 2, . . . ,n — 1, form one instance (a',t') by setting 
t' = t mod P + kP and a\ = cij mod P for % = 1,2, ... ,n. Observe that every 
solution of (a,t) is a solution of (a! ,t') for at least one value of k. We claim that 
with high probability each of the n instances (a' , if) has no other solutions beyond 
the solutions of (a, t). 

Consider an arbitrary vector x £ {0, 1}" that is not a solution of (a, t) but is a 
solution of (a 1 , t'). This happens only if P divides Z = \t — Y2i=i a i x i \ 0- Let us 
analyze the probability for the event that P divides Z. Since Z < nt has at most 2™ 
bits (recall that log nt <2 n ), there can be at most 2"/(3n) primes with 3n+ 1 bits 
that divide Z. By the Prime Number Theorem we know that there are ft(2 3n+1 / n) 
primes with 3n + 1 bits. Since P is a uniform random prime with 3n + 1 bits, we 
have that P divides Z with probability 0(2~ 2n n 2 ). By linearity of expectation, 
the expected number of vectors x G {0, 1}™ that are not solutions of (a, t) but are 
solutions of (a', t') is thus 0(2~ n n 2 ). By an application of Markov's inequality and 
the union bound, with probability 1 — o(l) each of the n instances (a' ,t') has no 
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other solutions beyond the solutions of (a,t). By construction, \ogt' — 0(n). This 
completes the first stage. 

The second stage controls the number of solutions by a standard isolation tech- 
nique. Consider an instance (a, t) input to the second stage. Assume that the set 
of all solutions S C {0, 1}" of (a, t) is nonempty and guess that it has size in the 
range 2 s < \S\ < 2 S+1 for s = 0, 1, . . . , n — 1. (That is, we try out all values and at 
least one will be the correct guess.) Select (arbitrarily) a prime P in the interval 
2 s < P < 2 S+1 . Select r% , T2, . . . > v n and u independently and uniformly at random 
from {0,1,..., P - 1}. 

For any fixed x 6 S, we have that 

n 

(6) TiXi = u (mod P) 

i=i 

holds with probability 1/P over the random choices of ri, f2, ...,r„, u. Similarly, 
any distinct x,x' g S both satisfy ^ with probability 1/P 2 . 

Fix a correct guess of s, so that 1 < IS'l/P < 2, and let the random variable Sp 
be the number of solutions in S that also satisfy Letting A = \S\/P we then 
have 

E[S P ] = A and E[Sp] = E[S P ] + ~ - < A + A 2 , 

/ - 

so the first and second moment methods give 

Pr[S P > 10] < BM = A < 1/5 and 

By a union bound, we have that for this correct guess of s at least 1 and at most 
10 of the solutions in S satisfy (|6| with probability at least 1/4. 

Let x G S satisfy ([6]). Then, there exists afc = 0, l,...,n — 1 such that 
y^._ 1 TiXi = u + Pk. (Note that this is equality over the integers, not a modular 
congruence!) Again we can guess this value k by iterating over all n possibilities. 
Put a- = at + (nt + l)r s: for i = 1, 2, . . . , n and t' = t + (nt + l)(u + Pk). 

Now observe that if S is empty, then none of the n 2 instances (a', t') has solutions 
with probability 1. Conversely, if S is nonempty, then at least one of the instances 
(a',t') has at least 1 and at most 10 solutions with probability at least 1/4. By 
construction, logt' = 0(n). Since the first stage gives n outputs, the second stage 
gives n 3 outputs in total. □ 

6. Parallelization 
In this section we prove Theorem |1.4| restated here for convenience. 



Theorem |1.4| (restated). The algorithm of Theorem 1.3 can be implemented to run 
in O* (2 T ( a '> n / ' P) parallel time on P processors each using 0*(2 <Tn ) space, provided 
P < 2( 2r (°')- 1 ) n . 

Proof. We divide the P processors evenly among the roughly 2' 3 ™ choices of s' in 
line If P < 2^ n , then this trivially gives full parallelization. Otherwise, fix 
a choice of s' . We have P' w P/2 l3n processors available to solve the instance 
restricted to this value of s'. 
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We now let each of the P' available processors solve the left recursive call on 
line 1 1 1 1 in full, independently of each other. Only in the right recursive call on 



line 14 do we split up the task and use the P' processors to get a factor P' speedup, 
provided that P' is not too large (cf. the theorem statement). 

Let us write <r/ and n/ (resp. a r and n r ) for the values of a and n on the 
left (resp. right) recursive branch. The left branch takes time 0*(2 T ^'^ n '). By 
an inductive argument, if P' < 2( 2T (° v )~ 1 )™ r , then the right branch takes time 
0*{2 T ^ n -/P'). Indeed, to set up the induction, observe that in the base case 
when a > 1/4, there is nothing to prove, since the bound on P is then simply 1. 
The overall time taken is within a constant of the maximum of these because the 
recursion depth is 0(1). 

Thus to complete the proof it suffices to establish the inequalities 

(7) max {2 T ^ ni , 2 T ^>r /p/| < 2 T W n /p , 

/g\ pi < 2(2T(cr r )-l)n r _ 

Let us start with ([7]). For the left branch, we have n; = an = (1 — r(a))n. Using 
the assumption that P < 2^ 2T ^~ 1 ^ n and the trivial bound t(<ji) < 1, we see that 
2^( CT <)»! < 2 7 "( <T )™/F as desired. For the right branch, we have 

n r = (1 — a)n = r(u)n , 

T(a r ) = r(a /r(cr)) = — , 

r(cr 



where the last step uses Proposition |3.6| Thus, 

T(a r )n r = (2r(er) — 1 + cr)n , 

and hence, 

2T(av)n r I pi _ 2(2t(o-)— l+a)n /(p/2^-~ r ( CT ) — <T )™^ = 2 r ( CT )" jp 

It remains to establish @. Because P < 2( 2r ( CT )- 1 )™, it suffices to show that 

2(2T(o-)-l)ri i2(l-T(<j)-o-)n ^ 2( 2T ( cr r) _1 )"'' — 2( 4r ( cr )~ 2 + 2<T ~ T ( <T )) n 

Canceling exponents on the left and on the right, everything cancels except for one 
of the two an's on the right. □ 
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